AITopics | answer extraction

Collaborating Authors

answer extraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction

Lawrence, Logan, Saha, Oindrila, Wei, Megan, Sun, Chen, Maji, Subhransu, Van Horn, Grant

arXiv.org Artificial IntelligenceDec-10-2025

Despite the renewed interest in zero-shot visual classification due to the rise of Multimodal Large Language Models (MLLMs), the problem of evaluating free-form responses of auto-regressive models remains a persistent challenge. Most existing works focus on language-only tasks or don't consider Multiple Choice Questions (MCQs) beyond 5-way options, both of which are critical capabilities to solve tasks in Fine-Grained Visual Classification (FGVC) where choice counts are in the hundreds to thousands and the choices are highly related. Furthermore, in this highly multi-way MCQ setting it is not clear how to extend LLM choice extraction to retrieval-based problems, where computing probabilities over the choice set is computationally costly. In this work we investigate nlg2choice, a simple two-stage method which first asks the MLLM an open-ended question for the task with minimal constraints, then uses text-only constrained decoding to predict the most likely choice. In retrieval settings, we compute the probability of the constrained response taking that choice with an early stopping method to significantly improve throughput. Our results show improvement over a suite of seven fine-grained visual datasets when evaluating in terms of classification and retrieval, and show that this performance holds over the various ways that users of LLMs can implement tasks in natural language.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.14885

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Two-Stage Quranic QA via Ensemble Retrieval and Instruction-Tuned Answer Extraction

Basem, Mohamed, Oshallah, Islam, Hamdi, Ali, Shaban, Khaled, Kassab, Hozaifa

arXiv.org Artificial IntelligenceSep-5-2025

--Quranic Question Answering presents unique challenges due to the linguistic complexity of Classical Arabic and the semantic richness of religious texts. In this paper, we propose a novel two-stage framework that addresses both passage retrieval and answer extraction. For passage retrieval, we ensemble fine-tuned Arabic language models to achieve superior ranking performance. For answer extraction, we employ instruction-tuned large language models with few-shot prompting to overcome the limitations of fine-tuning on small datasets. Our approach achieves state-of-the-art results on the Quran QA 2023 Shared T ask, with a MAP@10 of 0.3128 and MRR@10 of 0.5763 for retrieval, and a pAP@10 of 0.669 for extraction, substantially outperforming previous methods. These results demonstrate that combining model ensembling and instruction-tuned language models effectively addresses the challenges of low-resource question answering in specialized domains. The Holy Qur'an, revealed over 1,400 years ago, remains the primary source of guidance for over 1.8 billion Muslims worldwide. Beyond its religious significance, the Qur'an represents a masterpiece of Classical Arabic literature, containing profound linguistic, historical, and ethical insights that continue to be studied by scholars across multiple disciplines [1].

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.06971

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering

Molfese, Francesco Maria, Moroni, Luca, Gioffrè, Luca, Scirè, Alessandro, Conia, Simone, Navigli, Roberto

arXiv.org Artificial IntelligenceMar-19-2025

One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA). While open-ended question answering tasks are more challenging to evaluate, MCQA tasks are, in principle, easier to assess, as the model's answer is thought to be simple to extract and is directly compared to a set of predefined choices. However, recent studies have started to question the reliability of MCQA evaluation, showing that multiple factors can significantly impact the reported performance of LLMs, especially when the model generates free-form text before selecting one of the answer choices. In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons. We systematically analyze whether existing answer extraction methods are aligned with human judgment, and how they are influenced by answer constraints in the prompt across different domains. Our experiments demonstrate that traditional evaluation strategies often underestimate LLM capabilities, while LLM-based answer extractors are prone to systematic errors. Moreover, we reveal a fundamental trade-off between including format constraints in the prompt to simplify answer extraction and allowing models to generate free-form text to improve reasoning. Our findings call for standardized evaluation methodologies and highlight the need for more reliable and consistent MCQA evaluation practices.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.14996

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Improving Clinical Question Answering with Multi-Task Learning: A Joint Approach for Answer Extraction and Medical Categorization

Pattnayak, Priyaranjan, Patel, Hitesh Laxmichand, Agarwal, Amit, Kumar, Bhargava, Panda, Srikant, Kumar, Tejaswini

arXiv.org Artificial IntelligenceFeb-18-2025

Clinical Question Answering (CQA) plays a crucial role in medical decision-making, enabling physicians to extract relevant information from Electronic Medical Records (EMRs). While transformer-based models such as BERT, BioBERT, and ClinicalBERT have demonstrated state-of-the-art performance in CQA, existing models lack the ability to categorize extracted answers, which is critical for structured retrieval, content filtering, and medical decision support. To address this limitation, we introduce a Multi-Task Learning (MTL) framework that jointly trains CQA models for both answer extraction and medical categorization. In addition to predicting answer spans, our model classifies responses into five standardized medical categories: Diagnosis, Medication, Symptoms, Procedure, and Lab Reports. This categorization enables more structured and interpretable outputs, making clinical QA models more useful in real-world healthcare settings. We evaluate our approach on emrQA, a large-scale dataset for medical question answering. Results show that MTL improves F1-score by 2.2% compared to standard fine-tuning, while achieving 90.7% accuracy in answer categorization. These findings suggest that MTL not only enhances CQA performance but also introduces an effective mechanism for categorization and structured medical information retrieval.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2502.13108

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback

Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring

Akella, Amogh

arXiv.org Artificial IntelligenceDec-21-2024

In this paper, we explore how to leverage large language models (LLMs) to solve mathematical problems efficiently and accurately. Specifically, we demonstrate the effectiveness of classifying problems into distinct categories and employing category-specific problem-solving strategies to improve the mathematical performance of LLMs. We design a simple yet intuitive machine learning model for problem categorization and show that its accuracy can be significantly enhanced through the development of well-curated training datasets. Additionally, we find that the performance of this simple model approaches that of state-of-the-art (SOTA) models for categorization. Moreover, the accuracy of SOTA models also benefits from the use of improved training data. Finally, we assess the advantages of using category-specific strategies when prompting LLMs and observe significantly better performance compared to non-tailored approaches.

category, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.00042

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering

Zamil, Parvez, Rabby, Gollam, Rahman, Md. Sadekur, Auer, Sören

arXiv.org Artificial IntelligenceOct-29-2024

The growing volume of biomedical scholarly document abstracts presents an increasing challenge in efficiently retrieving accurate and relevant information. To address this, we introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization. Complementing this, we employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction. Our approach is evaluated across three configurations: scholarly document abstract retrieval, gold-standard scholarly documents abstract, and gold-standard snippets, consistently outperforming established methods such as RYGH and bio-answer finder. Notably, we demonstrate that extracting answers from scholarly documents abstracts alone can yield high accuracy, underscoring the sufficiency of abstracts for many biomedical queries. Despite its compact size, MiniLM exhibits competitive performance, challenging the prevailing notion that only large, resource-intensive models can handle such complex tasks. Our results, validated across various question types and evaluation batches, highlight the robustness and adaptability of our method in real-world biomedical applications. While our approach shows promise, we identify challenges in handling complex list-type questions and inconsistencies in evaluation metrics. Future work will focus on refining the topic model with more extensive domain-specific datasets, further optimizing MiniLM and utilizing large language models (LLM) to improve both precision and efficiency in biomedical question answering.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.00041

Country:

Asia > Middle East > Jordan (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Unsupervised Candidate Answer Extraction through Differentiable Masker-Reconstructor Model

Wang, Zhuoer, Wang, Yicheng, Zhu, Ziwei, Caverlee, James

arXiv.org Artificial IntelligenceOct-19-2023

Question generation is a widely used data augmentation approach with extensive applications, and extracting qualified candidate answers from context passages is a critical step for most question generation systems. However, existing methods for candidate answer extraction are reliant on linguistic rules or annotated data that face the partial annotation issue and challenges in generalization. To overcome these limitations, we propose a novel unsupervised candidate answer extraction approach that leverages the inherent structure of context passages through a Differentiable Masker-Reconstructor (DMR) Model with the enforcement of self-consistency for picking up salient information tokens. We curated two datasets with exhaustively-annotated answers and benchmark a comprehensive set of supervised and unsupervised candidate answer extraction methods. We demonstrate the effectiveness of the DMR model by showing its performance is superior among unsupervised methods and comparable to supervised methods.

candidate answer, computational linguistic, context passage, (16 more...)

arXiv.org Artificial Intelligence

2310.13106

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)
(12 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Lovenia, Holy, Wilie, Bryan, Chung, Willy, Zeng, Min, Cahyawijaya, Samuel, Dan, Su, Fung, Pascale

arXiv.org Artificial IntelligenceSep-12-2022

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.

computational linguistic, downstream task, gold answer, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.repl4nlp-1.7

2203.16027

Country:

Asia > China > Hong Kong (0.05)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

Van, Hoang, Yadav, Vikas, Surdeanu, Mihai

arXiv.org Artificial IntelligenceJun-8-2021

We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12\%), and answer extractor (4.33\% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9\% exact match (EM) and 2.7\% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.

answer span, augmentation, techqa, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3404835.3463099

2106.04134

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > Canada (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Sachan, Devendra Singh, Patwary, Mostofa, Shoeybi, Mohammad, Kant, Neel, Ping, Wei, Hamilton, William L, Catanzaro, Bryan

arXiv.org Artificial IntelligenceJan-2-2021

Recent work on training neural retrievers for open-domain question answering (OpenQA) has employed both supervised and unsupervised approaches. However, it remains unclear how unsupervised and supervised methods can be used most effectively for neural retrievers. In this work, we systematically study retriever pre-training. We first propose an approach of unsupervised pre-training with the Inverse Cloze Task and masked salient spans, followed by supervised finetuning using question-context pairs. This approach leads to absolute gains of 2+ points over the previous best result in the top-20 retrieval accuracy on Natural Questions and TriviaQA datasets. We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models. In the first approach, the reader considers each retrieved document separately while in the second approach, the reader considers all the retrieved documents together. Our experiments demonstrate the effectiveness of these approaches as we obtain new state-of-the-art results. On the Natural Questions dataset, we obtain a top-20 retrieval accuracy of 84, an improvement of 5 points over the recent DPR model. In addition, we achieve good results on answer extraction, outperforming recent models like REALM and RAG by 3+ points. We further scale up end-to-end training to large models and show consistent gains in performance over smaller models.

configuration, retrieval accuracy, retriever, (15 more...)

arXiv.org Artificial Intelligence

2101.00408

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback